Exploring weak scalability for FEM calculations on a GPU-enhanced cluster
نویسندگان
چکیده
The first part of this paper surveys co-processor approaches for commodity based clusters in general, not only with respect to raw performance, but also in view of their system integration and power consumption. We then extend previous work on a small GPU cluster by exploring the heterogeneous hardware approach for a large-scale system with up to 160 nodes. Starting with a conventional commodity based cluster we leverage the high bandwidth of graphics processing units (GPUs) to increase the overall system bandwidth that is the decisive performance factor in this scenario. Thus, even the addition of low-end, out of date GPUs leads to improvements in both performanceand power-related metrics.
منابع مشابه
A scalable hybrid algorithm based on domain decomposition and algebraic multigrid for solving partial differential equations on a cluster of CPU/GPUs
Several of the top ranked supercomputers are based on the hybrid architecture consisting of a large number of CPUs and GPUs. Very high performance has been obtained for problems with special structures, such as FFT-based image processing or N-body based particle calculations. However, for the class of problems described by partial differential equations discretized by finite difference (or othe...
متن کاملOPTIMAL SOLUTION OF RICHARDS’ EQUATION FOR SLOPE INSTABILITY ANALYSIS USING AN INTEGRATED ENHANCED VERSION OF BLACK HOLE MECHANICS INTO THE FEM
One of the most crucial problems in geo-engineering is the instability of unsaturated slopes, causing severe loss of life and property worldwide. In this study, five novel meta-heuristic methods are employed to optimize locating the Critical Failure Surface (CFS) and corresponding Factor of Safety (FOS). A Finite Element Method (FEM) code is incorporated to convert the strong form of the Richar...
متن کاملGeneral-purpose molecular dynamics simulations on GPU-based clusters
We present a GPU implementation of LAMMPS, a widely-used parallel molecular dynamics (MD) software package, and show 5x to 13x single node speedups versus the CPU-only version of LAMMPS. This new CUDA package for LAMMPS also enables multi-GPU simulation on hybrid heterogeneous clusters, using MPI for inter-node communication, CUDA kernels on the GPU for all methods working with particle data, a...
متن کاملScalable Breadth-First Search on a GPU Cluster
On a GPU cluster, the ratio of high computing power to communication bandwidth makes scaling breadthfirst search (BFS) on a scale-free graph extremely challenging. By separating high and low out-degree vertices, we present an implementation with scalable computation and a model for scalable communication for BFS and direction-optimized BFS. Our communication model uses global reduction for high...
متن کاملScalability of parallel finite element algorithms on multi-core platforms
The speedup of element-by-element FEM algorithms depends not only on peak processor performance but also on access time to shared mesh data. Eliminating memory boundness would significantly speed up unstructured mesh computations on hybrid multi-core architectures, where the gap between processor and memory performance continues to grow. The speedup can be achieved by ordering unknowns so that ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Parallel Computing
دوره 33 شماره
صفحات -
تاریخ انتشار 2007